Accessor methods for skiplist nodes #385
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is, ostensibly, a large PR. However, this PR contains no functional changes.
Background
When using go-leveldb in go-ethereum, we've often seen
memdb.findGE
pop up as a large node on cpu profiles. This blob then expands further, making it look like the nativebytes.Compare
->cmpbody
is slow. However, that is not the case.The slow thing is not the comparison by itself, but rather the memory accesses being made. A lookup for an item in the memory database can easily do ~30 comparisons for a dataset of a few hundred Mb, and in the 40 region for gigabytes of data (see #384 for some charts about this).
These memory accesses are performed over
In go-ethereum, here's an example stats from one of the memory dbs:
Memory db stats
The actual data size is
406MB
. The skiplist metadata is173Mb
, almost43%
of the data.When performing a lookup, the skiplist traversal takes place over this
173MB
memory structure,and the comparisons load from the
405MB
slice.So a
Put
orGet
operation doing30
comparisons, will do30+
loads, spread across the173MB
skiplist (nodeData
), and30
loads, spread across the405MB
kvdata
structure, to load the keys.I looked at some other data engines, namely pebble skiplist and badger skiplist.
Those engines are also using skiplist, but have spent a lot of time minimizing these structures. They use an
arena
which is a big byte slice, on to which theyuse the
unsafe
package to cast slices into object-form. That is a bit on the extreme, and not what is done in this PR.Reason for this PR
This PR doesn't actually change anything in the underlying model, but it does introduce accessors to manipulate
nodeData
.The idea being, that if the code uses accessor methods, then it's easier to experiment with two things:
Field packing
The current implementation of
nodeData
is a slice ofint
, which isuint64
on a 64-bit platform. Thus, every single field in anode
takes up8
bytes.If this is changed into
uint32
, thenodeData
goes down by50%
. Furthermore, theheight
field is limited to12
, and could be packed as auint8
into e.g.keyLength
.In general, this PR enables experimentation with different ways to pack the fields.
With this PR, converting the
int
touint32
is as simple as redefiningnodeInt
asuint32
.None of the changes described below are part of this PR. -- they just become very simple to experiment with.
I tested this, and got the following charts. The datapoints are
400
points during the insertion of4194304
items, each32
byte key and32
byte value. The Y-label shows the time (ms
) that inserting the (around 10K) items took.int
(uint64
) as backing typeuint32
as backing typeThe charts show a slight speed improvement, and a sizeable reduction in memory usage.
uint32
as backing type + packheight
intokeyLen
If we use
uint32
, and24 bits
to store keySize, and packheight
as 8 bits into that field, metadata goes from89M
to72M
In this run, however, the speed degraded back bit.
KV separation
A separate track to improve the memdb lookup speed would be to split up the keys and values, which currently both reside in
kvData
. This PR makes such experimentationsomewhat simpler.